[SOUND] This lecture is
a summary of this course.
This map shows the major topics
we have covered in this course.
And here are some key
high-level take-away messages.
First we talk about natural
language content analysis.
Here the main take-away message is natural
language processing is the foundation for
textual retrieval, but
current NLP isn't robust enough.
So the back of words
replenishing is generally
the main method used in
modern search engines and
it's often sufficient for
most of the search tasks.
But obviously, for
more compass search tasks,
then we need a deeper measurement
processing techniques.
And we then talked about
a high-level strategies for
text access and we talked about
push versus pull in plural.
We talked about a query,
which is browsing.
Now, in general in future search engines,
we should integrate
all these techniques to provide
a multiple information access and
then we talked about a number of
issues related to search engines.
We talked about the search problem and
we framed that as a ranking problem and
we talked about the a number
of retrieval methods.
We start with an overview of
the vector space model and
probabilistic model and then we talked
about the vector space model in that.
We also later talked about
leverageable learning approach and
that's probabilistic model.
And here, the main take-away message is
that model retrieval functions tend to
look similar and
they generally use various heuristics.
Most important ones are TF-IDF waiting
document length normalization and
that TF is often transformed through
a sub-linear transformation function and
then we talked about how to
implement a retrieval system.
And here the main technique that we talked
about how to construct an inverted index.
So that we can prepare the system
to answer a query quickly and
we talked about how to, to fast research
by using the inverted index and
we then talked about how to
evaluate the text retrieval system
mainly introduced the Cranfield
evaluation methodology.
This was a very important the various
methodology of that can be applied to
many tasks.
We talked about the major
evaluation measures.
So the most important measures for
a search engine are MAP mean
average precision and nDCG.
Normalized discounted accumulative
gain and also precision and
record the two basic measures.
And we then talked about
feedback techniques.
And we talked about the rock you
in the vector space model and
the mixture model in
the language modeling approach.
Feedback is very important
technique especially considering
the opportunity of learning from
a lot of pixels on the web.
We then talked about the web search.
And here, we talk about the how to
use parallel indexing to resolve
the scalability issue in indexing,
we introduce a MapReduce and
then we talked about the how to using
information interacting pull search.
We talked about page random
hits as the major algorithms
to analyze links on the web.
We then talked about learning to rank.
This is a use of machine learning
to combine multiple features for
improving scoring.
Not only the effectiveness can be
improved using this approach but
we can also improve the robustness
of the ranking function,
so that it's not easy to spam
a search engine with just a,
a some features to promote a page.
And finally,
we talked about the future of web search.
We talked about some major
interactions that we might assume
in the future in improving the current
generation of search engines.
And then finally, we talked about the
Recommender System and these are systems
to implement the push mode and
we'll talk about the two approaches.
One is content based,
one is collaborative filtering and
they can be combined together.
Now an obvious missing piece in this
picture is the user, you can see.
So user interface is also a important
component in any search engine,
even though the current search
interface is relatively simple.
There actually have been a lot
of studies of user interfaces
related to visualization for
example and this is topic to that,
you can learn more by reading this book.
It's a excellent book about all kind
of studies of search user interface.
If you want to know more about the,
the topics that we talked about,
you can also read some additional
readings that are listed here.
In this short course, we are only managing
to cover some basic topics in text
retrieval in search engines.
And these resources provide additional
information about more advanced topics and
they give more thorough treatment of
some of the topics that we talked about.
And a main source is
synthesis digital library
where you can see a lot
of short textbook or
textbooks or long tutorials.
They tend to provide us with a lot of
information to explain a topic and
there are multiple series that
are related to this course.
One is information concepts,
retrieval and services.
Another is human Language technology and
yet, another is artificial
intelligence and machine learning.
There are also some major journals and
conferences listed over here that
tend to have a lot of research papers
related to the topic of this course.
And finally for
more information about resources
including readings and tool kits, etc.
You can check out this URL.
So, if you have not taken
the text mining course in this
in this data mining specialization series,
then naturally,
the next step is to take that calls.
As this picture shows
to mine the text data,
we generally need two kinds of techniques.
One is text retrieval,
which is covered in this course.
And these techniques will help us
convert raw big text data into small,
relevant text data, which are actually
needed in the specific application.
And human plays important
role in mining any text data,
because text data is written for
humans to consume.
So, involving humans in the process
of data mining is very important.
And in this course,
we have covered various strategies to
help users get access to
the most relevant data.
These techniques are also essential
in any text mining system to help
provide providence and
to help users interpret the inner
patterns that the user would
find through text data mining.
So, in general, the user would have to
go back to the original data to better
understand the patterns.
So the text mining course or
rather text mining and ana,
analytics course will be deal,
dealing with what to do once
the user has found the information.
So this is a in this picture
where we would convert
the text data into action or knowledge.
And this has to do with helping
users to go further digest with
a found information or
to find the patterns and
to reveal knowledge buried in text and
such knowledge can be used in
application system to help decision-making
or to help user finish a task.
So, if you have not taken that
course the natural step and
the natural next step would
be to take that course.
Thank you for taking this course.
I hope you have found this
course to be useful to you and
I look forward to interacting
with you at a future activity.
[MUSIC]

